Hadoop is notoriously under-documented, as I recently discovered. I am using Hadoop in my summer research position, and have launched myself into the wonderful and aggravating world of servers and open-source map-reduce programs. And one of the fun aspects of releasing open-source software, I suppose, is no one can complain if you leave it largely undocumented.
However, this does leave the experience of installing and running Hadoop as a rather harrowing experience for the uninitiated. But hands-on learning is the best way! And there are some pretty good, if often incomplete or outdated, tutorials out there, including this and this.
Those, along with a few dozen web searches, and hours of pain, struggle, and frustration, led me to the successful operation of Hadoop on the standard WordCount trial code.
I record my efforts, failures, and discoveries now for my own benefit as well as for any who might be struggling with same.
The “No such file or directory” error.
When Hadoop is set up, and you attempt to start the instance using start-all.sh or start-dfs.sh, you may get the error noted above. It is likely that either your HADOOP_HOME directory is not set for the user Hadoop is running under, or mkdir failed to create the log directory due to permissions errors.
To check for the first of these cases, type “echo $HADOOP_HOME”, to see if the variable is set. If you see nothing but a blank line, or get an error telling you that the directory cannot be found, you’ll need to change this directory to the true Hadoop installation directory (like “/home/
If HADOOP_HOME prints correctly, you will need to chmod the permissions on the Hadoop directory. Instructions on using chmod can be found here. Remember the -R flag to include subdirectories.
HADOOP_OPTS and HADOOP_CLASSPATH
Contrary to what several tutorials indicate, you will likely not need to have your HADOOP_OPTS variable set — in fact, it can be empty.
On the other hand, the HADOOP_CLASSPATH should contain the location of the hadoop/lib directory, e.g. “
Other small but Important Items
- Don’t forget your ‘sudo’. If you’re operating on files from a different user’s directory (like if you’re using a Hadoop-specific user but saving files on the standard user), you’ll need to sudo most of your commands.
- Likewise, chmod all the important directories before you get started.
- The PATH environment variable must have the “bin” folder within it, e.g. “/home/
Good luck with your Hadooping! I will add more hints and tips as I encounter them.